Goto

Collaborating Authors

 transfer matrix


SP2RINT: Spatially-Decoupled Physics-Inspired Progressive Inverse Optimization for Scalable, PDE-Constrained Meta-Optical Neural Network Training

Ma, Pingchuan, Yin, Ziang, Jing, Qi, Gao, Zhengqi, Gangi, Nicholas, Zhang, Boyang, Huang, Tsung-Wei, Huang, Zhaoran, Boning, Duane S., Yao, Yu, Gu, Jiaqi

arXiv.org Artificial Intelligence

DONNs leverage light propagation for efficient analog AI and signal processing. Advances in nanophotonic fabrication and metasurface-based wavefront engineering have opened new pathways to realize high-capacity DONNs across various spectral regimes. Training such DONN systems to determine the metasurface structures remains challenging. Heuristic methods are fast but oversimplify metasurfaces modulation, often resulting in physically unrealizable designs and significant performance degradation. Simulation-in-the-loop optimizes implementable metasurfaces via adjoint methods, but is computationally prohibitive and unscalable. To address these limitations, we propose SP2RINT, a spatially decoupled, progressive training framework that formulates DONN training as a PDE-constrained learning problem. Metasurface responses are first relaxed into freely trainable transfer matrices with a banded structure. We then progressively enforce physical constraints by alternating between transfer matrix training and adjoint-based inverse design, avoiding per-iteration PDE solves while ensuring final physical realizability. To further reduce runtime, we introduce a physics-inspired, spatially decoupled inverse design strategy based on the natural locality of field interactions. This approach partitions the metasurface into independently solvable patches, enabling scalable and parallel inverse design with system-level calibration. Evaluated across diverse DONN training tasks, SP2RINT achieves digital-comparable accuracy while being 1825 times faster than simulation-in-the-loop approaches. By bridging the gap between abstract DONN models and implementable photonic hardware, SP2RINT enables scalable, high-performance training of physically realizable meta-optical neural systems. Our code is available at https://github.com/ScopeX-ASU/SP2RINT


A Centralized-Distributed Transfer Model for Cross-Domain Recommendation Based on Multi-Source Heterogeneous Transfer Learning

Xu, Ke, Wang, Ziliang, Zheng, Wei, Ma, Yuhao, Wang, Chenglin, Jiang, Nengxue, Cao, Cai

arXiv.org Artificial Intelligence

Cross-domain recommendation (CDR) methods are proposed to tackle the sparsity problem in click through rate (CTR) estimation. Existing CDR methods directly transfer knowledge from the source domains to the target domain and ignore the heterogeneities among domains, including feature dimensional heterogeneity and latent space heterogeneity, which may lead to negative transfer. Besides, most of the existing methods are based on single-source transfer, which cannot simultaneously utilize knowledge from multiple source domains to further improve the model performance in the target domain. In this paper, we propose a centralized-distributed transfer model (CDTM) for CDR based on multi-source heterogeneous transfer learning. To address the issue of feature dimension heterogeneity, we build a dual embedding structure: domain specific embedding (DSE) and global shared embedding (GSE) to model the feature representation in the single domain and the commonalities in the global space,separately. To solve the latent space heterogeneity, the transfer matrix and attention mechanism are used to map and combine DSE and GSE adaptively. Extensive offline and online experiments demonstrate the effectiveness of our model.


What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents

Jin, Mingyu, Wang, Beichen, Xue, Zhaoqian, Zhu, Suiyuan, Hua, Wenyue, Tang, Hua, Mei, Kai, Du, Mengnan, Zhang, Yongfeng

arXiv.org Artificial Intelligence

In this study, we introduce "CosmoAgent," an innovative artificial intelligence framework utilizing Large Language Models (LLMs) to simulate complex interactions between human and extraterrestrial civilizations, with a special emphasis on Stephen Hawking's cautionary advice about not sending radio signals haphazardly into the universe. The goal is to assess the feasibility of peaceful coexistence while considering potential risks that could threaten well-intentioned civilizations. Employing mathematical models and state transition matrices, our approach quantitatively evaluates the development trajectories of civilizations, offering insights into future decision-making at critical points of growth and saturation. Furthermore, the paper acknowledges the vast diversity in potential living conditions across the universe, which could foster unique cosmologies, ethical codes, and worldviews among various civilizations. Recognizing the Earth-centric bias inherent in current LLM designs, we propose the novel concept of using LLMs with diverse ethical paradigms and simulating interactions between entities with distinct moral principles. This innovative research provides a new way to understand complex inter-civilizational dynamics, expanding our perspective while pioneering novel strategies for conflict resolution, crucial for preventing interstellar conflicts. We have also released the code and datasets to enable further academic investigation into this interesting area of research.


M3ICRO: Machine Learning-Enabled Compact Photonic Tensor Core based on PRogrammable Multi-Operand Multimode Interference

Gu, Jiaqi, Zhu, Hanqing, Feng, Chenghao, Jiang, Zixuan, Chen, Ray T., Pan, David Z.

arXiv.org Artificial Intelligence

Photonic computing shows promise for transformative advancements in machine learning (ML) acceleration, offering ultra-fast speed, massive parallelism, and high energy efficiency. However, current photonic tensor core (PTC) designs based on standard optical components hinder scalability and compute density due to their large spatial footprint. To address this, we propose an ultra-compact PTC using customized programmable multi-operand multimode interference (MOMMI) devices, named M3ICRO. The programmable MOMMI leverages the intrinsic light propagation principle, providing a single-device programmable matrix unit beyond the conventional computing paradigm of one multiply-accumulate (MAC) operation per device. To overcome the optimization difficulty of customized devices that often requires time-consuming simulation, we apply ML for optics to predict the device behavior and enable a differentiable optimization flow. We thoroughly investigate the reconfigurability and matrix expressivity of our customized PTC, and introduce a novel block unfolding method to fully exploit the computing capabilities of a complex-valued PTC for near-universal real-valued linear transformations. Extensive evaluations demonstrate that M3ICRO achieves a 3.4-9.6x smaller footprint, 1.6-4.4x higher speed, 10.6-42x higher compute density, 3.7-12x higher system throughput, and superior noise robustness compared to state-of-the-art coherent PTC designs, while maintaining close-to-digital task accuracy across various ML benchmarks. Our code is open-sourced at https://github.com/JeremieMelo/M3ICRO-MOMMI.


Learning Quantum Processes and Hamiltonians via the Pauli Transfer Matrix

Caro, Matthias C.

arXiv.org Artificial Intelligence

Learning about physical systems from quantum-enhanced experiments, relying on a quantum memory and quantum processing, can outperform learning from experiments in which only classical memory and processing are available. Whereas quantum advantages have been established for a variety of state learning tasks, quantum process learning allows for comparable advantages only with a careful problem formulation and is less understood. We establish an exponential quantum advantage for learning an unknown $n$-qubit quantum process $\mathcal{N}$. We show that a quantum memory allows to efficiently solve the following tasks: (a) learning the Pauli transfer matrix of an arbitrary $\mathcal{N}$, (b) predicting expectation values of bounded Pauli-sparse observables measured on the output of an arbitrary $\mathcal{N}$ upon input of a Pauli-sparse state, and (c) predicting expectation values of arbitrary bounded observables measured on the output of an unknown $\mathcal{N}$ with sparse Pauli transfer matrix upon input of an arbitrary state. With quantum memory, these tasks can be solved using linearly-in-$n$ many copies of the Choi state of $\mathcal{N}$, and even time-efficiently in the case of (b). In contrast, any learner without quantum memory requires exponentially-in-$n$ many queries, even when querying $\mathcal{N}$ on subsystems of adaptively chosen states and performing adaptively chosen measurements. In proving this separation, we extend existing shadow tomography upper and lower bounds from states to channels via the Choi-Jamiolkowski isomorphism. Moreover, we combine Pauli transfer matrix learning with polynomial interpolation techniques to develop a procedure for learning arbitrary Hamiltonians, which may have non-local all-to-all interactions, from short-time dynamics. Our results highlight the power of quantum-enhanced experiments for learning highly complex quantum dynamics.


Sample, computation vs storage tradeoffs for classification using tensor subspace models

Chaghazardi, Mohammadhossein, Aeron, Shuchin

arXiv.org Machine Learning

Our main tool is the use of tensor subspaces, i.e. subspaces with a Kronecker structure, for embedding the data into lower dimensions. Among the subspaces with a Kronecker structure, we show that using subspaces with a hierarchical structure for representing data leads to improved tradeoffs. One of the main reasons for the improvement is that embedding data into these hierarchical Kronecker structured subspaces prevents overfitting at higher latent dimensions.


Knowledge Graph Completion with Adaptive Sparse Transfer Matrix

Ji, Guoliang (Institute of Automation, Chinese Academy of Sciences) | Liu, Kang (Institute of Automation, Chinese Academy of Sciences) | He, Shizhu (Institute of Automation, Chinese Academy of Sciences) | Zhao, Jun (Institute of Automation, Chinese Academy of Sciences)

AAAI Conferences

We model knowledge graphs for their completion by encoding each entity and relation into a numerical space. All previous work including Trans(E, H, R, and D) ignore the heterogeneity (some relations link many entity pairs and others do not) and the imbalance (the number of head entities and that of tail entities in a relation could be different) of knowledge graphs. In this paper, we propose a novel approach TranSparse to deal with the two issues. In TranSparse, transfer matrices are replaced by adaptive sparse matrices, whose sparse degrees are determined by the number of entities (or entity pairs) linked by relations. In experiments, we design structured and unstructured sparse patterns for transfer matrices and analyze their advantages and disadvantages. We evaluate our approach on triplet classification and link prediction tasks. Experimental results show that TranSparse outperforms Trans(E, H, R, and D) significantly, and achieves state-of-the-art performance.


A Bayesian Approach to Sparse plus Low rank Network Identification

Zorzi, Mattia, Chiuso, Alessandro

arXiv.org Machine Learning

We consider the problem of modeling multivariate time series with parsimonious dynamical models which can be represented as sparse dynamic Bayesian networks with few latent nodes. This structure translates into a sparse plus low rank model. In this paper, we propose a Gaussian regression approach to identify such a model.


COT: Contextual Operating Tensor for Context-Aware Recommender Systems

Liu, Qiang (Institute of Automation, Chinese Academy of Sciences) | Wu, Shu (Institute of Automation, Chinese Academy of Sciences) | Wang, Liang (Institute of Automation, Chinese Academy of Sciences)

AAAI Conferences

With rapid growth of information on the internet, recommender systems become fundamental for helping users alleviate the problem of information overload. Since contextual information can be used as a significant factor in modeling user behavior, various context-aware recommendation methods are proposed. However, the state-of-the-art context modeling methods treat contexts as other dimensions similar to the dimensions of users and items, and cannot capture the special semantic operation of contexts. On the other hand, some works on multi-domain relation prediction can be used for the context-aware recommendation, but they have problems in generating recommendation under a large amount of contextual information. In this work, we propose Contextual Operating Tensor (COT) model, which represents the common semantic effects of contexts as a contextual operating tensor and represents a context as a latent vector. Then, to model the semantic operation of a context combination, we generate contextual operating matrix from the contextual operating tensor and latent vectors of contexts. Thus latent vectors of users and items can be operated by the contextual operating matrices. Experimental results show that the proposed COT model yields significant improvements over the competitive compared methods on three typical datasets, i.e., Food, Adom and Movielens-1M datasets.